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ABSTRACT 

In a paper about pretty printing J. Hughes introduced two 
fundamental techniques for deriving programs from their 
specification, where a specification consists of a signature 
and properties that the operations of the signature are re¬ 
quired to satisfy. Briefly, the first technique, the term im¬ 
plementation, represents the operations by terms and works 
by defining a mapping from operations to observations — 
this mapping can be seen as defining a simple interpreter. 
The second, the context-passing implementation, represents 
operations as functions from their calling context to obser¬ 
vations. We apply both techniques to derive a backtracking 
monad transformer that adds backtracking to an arbitrary 
monad. In addition to the usual backtracking operations — 
failure and nondeterministic choice — the prolog cut and an 
operation for delimiting the effect of a cut are supported. 
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1. INTRODUCTION 

Why should one derive a program from its specification? 
Ideally, a derivation explains and motivates the various de¬ 
sign choices taken in a particular implementation. At best 
a derivation eliminates the need for so-called eureka steps, 
which are usually inevitable if a program is explained, say, 
by means of example. 

In a paper about pretty printing J. Hughes [6] introduced 
two fundamental techniques for deriving programs from their 
specification. Both techniques provide the programmer with 
considerable guidance in the process of program derivation. 
To illustrate their utility and versatility we apply the frame¬ 
work to derive several monad transformers, which among 
other things add backtracking to an arbitrary monad. 

Briefly, a monad transformer is a mapping on monads that 
augments a given monad by a certain computational feature 
such as state, exceptions, or nondeterminism. Traditionally, 
monad transformers are introduced in a single big eureka 
step. Even the recent introductory textbook on functional 
programming [2] fails to explain the particular definitions 
of monad transformers. After defining an exception monad 
transformer R. Bird remarks: “Why have we chosen to write 
[ ... ]? The answer is: because it works.”. Building upon 
Hughes’ techniques we will try to provide a more satisfying 
answer. The reader should be prepared, however, that the 
results are somewhat different from the standard textbook 
examples. 

The paper is organized as follows. Sec. 2 reviews monads 
and monad transformers. Sec. 3 introduces Hughes’ tech¬ 
niques by means of a simple example. Sec. 4 applies the 
framework to derive a backtracking monad transformer that 
adds backtracking to an arbitrary monad. Finally, Sec. 5 
extends the design of Sec. 4 to include additional control 
constructs: Prolog’s cut and an operation for delimiting the 
effect of cut. Finally, Sec. 6 concludes and points out direc¬ 
tions for future work. 

2. PRELIMINARIES 

Monads have been proposed by Moggi as a means to struc¬ 
ture denotational semantics [11, 12]. Wadler popularized 
Moggi’s idea in the functional programming community by 
using monads to structure functional programs [IS, 16, 17]. 
In Haskell 98 [13] monads are captured by the class definition 
in Fig. 1. The essential idea of monads is to distinguish be¬ 
tween computations and values. This distinction is reflected 
on the type level: an element of m a represents a computa¬ 
tion that yields a value of type a. The trivial computation 


class Monad 

m where 

return 

:: a —i m a 

(»=) 

:: ma ^(a^mb)^mb 

03 

:: ma—>mb->mb 

fail 

:: String —tma 

m » n 

= m 3= const n 

fails 

= error s 

Figur, 

2 1: The Monad class. 


that immediately returns the value a is denoted return a. 
The operator (»=), commonly called ‘bind’, combines two 
computations: m 3= k applies k to the result of the com¬ 
putation m. The derived operation (;>) provides a handy 
shortcut if one is not interested in the result of the first 
computation. The operation fail is useful for signaling error 
conditions and will be used to this effect. Note that fail does 
not stem from the mathematical concept of a monad, but 
has been added to the monad class for pragmatic reasons, 
see [13, Sec. 3.14], 

The operations are required to satisfy the following so- 
called monad laws. 

return a »= k = k a (Ml) 

m »= return = m (M2) 

(m »= h) »= fe = m »= (Aa — > fci a »= kf) (M3) 

For an explanation of the laws we refer the reader to [2, 

Sec. 10.3]. Note that fail is intentionally left unspecified. 

Different monads are distinguished by the computational 
features they support. Each computational feature is typ¬ 
ically accessed through a number of additional operations. 
For instance, a backtracking monad additionally supports 
the operations false and (i) denoting failure and nondeter- 
ministic choice. It is relatively easy to construct a monad 
that supports only a single computational feature. Unfortu¬ 
nately, there is no uniform way of combining two monads, 
which support different computational features. The reason 
is simply that two features may interact in different ways. 
There is, however, a uniform method for augmenting a given 
monad by a certain computational feature. This method is 
captured by the following class definition which introduces 
monad transformers [9]. 

class Transformer t where 

promote :: ( Monad m)=>ma—>Tma 

observe :: ( Monad m)=>Tma—tma 

A monad transformer is basically a type constructor r that 
takes a monad m to a monad r m. It must additionally 
provide two operations: an operation for embedding com¬ 
putations from the underlying monad into the transformed 
monad and an inverse operation, which allows us to observe 
‘augmented’ computations in the underlying monad. Since 
observe forgets structure, it will in general be a partial func¬ 
tion. In what follows we will abbreviate observe by ui and 
promote by x. Turning to the laws we require promotion to 
respect the monad operations. 

7r ( return a) = return a (PI) 

n(m^k) = xm^(x-k) (P2) 

These laws determine x as a monad morphism. In general, x 


should respect every operation the underlying monad pro¬ 
vides in order to guarantee that a program that does not 
use new features behaves the same in the underlying and in 
the transformed monad. The counterpart of 7r is not quite 
a monad morphism. 

uj (return a ) = return a (01) 

u(x m^k) = m»=(wA) (02) 

The second law is weaker than the corresponding law for it. 
It is unreasonable to expect more since computations in t m 
can, in general, not be mimicked in m. 

3. ADDING ABNORMAL TERMINATION 

This section reviews Hughes’ technique by means of a sim¬ 
ple example. We show how to augment a given monad by an 
operation that allows one to terminate a computation ab¬ 
normally. Monads with additional features are introduced 
as subclasses of Monad. 

type Exception = String 

class (Monad m) => Raise m where 
raise :: Exception —tma 

The call raise e terminates the current computation. This 
property is captured by the law: 

raise e 3= k = raise e, (Rl) 

which formalizes that raise e is a left zero of (^=). Now, 
let us try to derive a monad transformer for this feature. 
Beforehand, we must determine how raise e is observed in 
the base monad. We specify: 

u, (raise e) = fail e, (03) 

which appears to be the only reasonable choice since we 
know nothing of the underlying monad. 

Remark. We do not consider an operation for trapping ex¬ 
ceptions (such as handle) in order to keep the introductory 
example short and simple. It is worth noting, however, 
that the derivation of a fully-fledged exception monad trans¬ 
former proceeds similar to the derivation given in Sec. 5. 

3.1 A term implementation 

The term implementation represents operations simply by 
terms of the algebra and works by defining an interpreter 
for the language. Since we have four operations — return , 
(^=), raise, and x — the datatype that implements the term 
algebra consequently comprises four constructors. We adopt 
the convention that monad transformers are given names 
that are all in upper case. For the constructor names we 
re-use the names of the operations with the first letter in 
upper case; operators like (3=) are prefixed by a colon. 

data RAISE m a 
= Return a 

| V6. (RAISE m b) :»= (b RAISE m a) 

| Raise Exception 
| Promote (m a) 

Note that the definition involves an existentially quantified 
type variable [8] in the type of (:»=). We use GHC/Hugs 
syntax for existential quantification: the existentially quan¬ 
tified variable is bound by an explicit universal quantifier 
written before the constructor. 




data RAISE m a = 

1 

1 

Return a 

Wb.(RAISE m b) :»= (6 RAISE m a) 

Raise Exception 

Promote (m a) 

instance Monad (RAISE m ) where 


return = 

Return 

(3=) 

(;»=) 

instance Raise (RAISE m) where 


raise = 

Raise 

instance Transformer RAISE where 


promote = 

Promote 

observe (Return a) = 

return a 

observe (Return a :»= k) = 

observe (k a) 

observe ((m :»= fa) :»= fe) = 

observe (m :3= (Aa —» fa a :3*= fe)) 

observe (Raise e k) = 

fail e 

observe (Promote m :»= k) = 

m 3= (observe ■ k) 

observe (Raise e) = 

observe (Promote m) = 

fail e 

Figure 2: A term implementation of RAISE. 


Now, each of the operations return , (»=), raise, and 7r 
is implemented by the corresponding constructor. In other 
words, the operations do nothing. All the work is performed 
by m which can be seen as defining a tiny interpreter for the 
monadic language. Except for one case the definition of m 
is straightforward. 

u (Return a) = return a 
w (m - k) = 
u (Raise e) = fail e 
m (Promote m) = m 

Can we fill in the blank on the right-hand side? It appears 
impossible to define m (m \^=k) in terms of its constituents. 
The only way out of this dilemma is to make a further case 
distinction on m: 


m (Return a :»= k) 
m ((m fa) fa) 
m (Raise e :»= k) 
m (Promote m k) 


w (k a) 

lj ( m :»= (Aa — i fa a :»= fa)) 
fail e 

m 3= (m ■ k). 


Voila. Each equation is a simple consequence of the monad 
laws and the laws for lj. In particular, the second equation 
employs (M3), the associative law for (^s=), to reduce the 
size of (:»=)’s first argument. This rewrite step is analogous 
to rotating a binary tree to the right. Fig. 2 summarizes the 
term implementation. Note that in the sequel we will omit 
trivial instance declarations like Monad (RAISE m) and 
Raise (RAISE m). 

What about correctness? First of all, the definition of m 
is exhaustive. It is furthermore terminating since the size of 
(:3*=)’s left argument is steadily decreasing. We can estab¬ 
lish termination using a so-called polynomial interpretation 
of the operations [4]: 


Return T a = 1 Raise T e = 1 

m n = 2 x m + n Promote T m = 1. 


A multivariate polynomial op T of n variables is associated 
with each n-ary operation op. For each equation m I = 
... lj r ... we must show that r i > r r for all vari¬ 


ables (ranging over positive integers) where r is given by 
r(op ei ... e n ) = op T (r ei)... (r e n ). Note that we con¬ 
sider bind only for the special case that the result of the 
first argument is ignored. The inclusion of m :»= k in its 
full generality is feasible but technically more involving since 
the interpretation of k depends on the value m computes. 

Does the implementation satisfy its specification? Since 
we are working in the free algebra, the laws do not hold: the 
expressions Return a and Return a :3= Return, for example, 
are distinct, unrelated terms. The laws of the specification 
only hold under observation. The monad laws become: 

lj (return a »= k) = lj (k a) 
lj (m »= return) = mm 

a) ((m^fa) »=fe) = m (m (Aa -»■ fa a k 2 )). 

The first and the third are direct consequences of cj’s defi¬ 
nition. The second can be shown by induction on m. For¬ 
tunately, we can live with the weakened laws, since the only 
way to run computations of type RAISE m is to use lj. 

3.2 A simplified term implementation 

Can we do better than the naive term implementation? 
A major criticism of the first attempt is that the operations 
do not exploit the algebraic laws. It is conceivable that we 
can work with a subset of the term algebra. For instance, 
we need not represent both Raise e and Raise e :^= Return. 
A rather systematic way to determine the required subset 
of terms is to program a simplifier for the datatype RAISE , 
which exploits the algebraic laws as far as possible. It turns 
out that we only need to modify m slightly. 


a (Return a) 
a (Return a :»= k) 
a ((m :3*= ki) fe) 
a (Raise e :»= k) 
a (Promote m :>= k) 
<7 (Raise e) 


RAISE m a RAISE m a 
Return a 
<7 (k a) 

<7 (m :»= (Aa —► ki a \^= fo)) 

Promote m :^s= (er ■ k) 

Raise e 



a (Promote m) = Promote m 

Inspecting the right hand sides we see that we require (:3=) 
only in conjunction with Promote. Since tt m is furthermore 
equivalent to 7r m~S*= return we can, in fact, restrict ourselves 
to the following subset of the term algebra. 

data RAISE m a 
= Return a 

| Mb. PromoteBind (m b) (b -+ RAISE m a) 

| Raise Exception 

Following Hughes [6] we call elements of the new datatype 
simplilled terms. We avoid the term normal form or canoni¬ 
cal form since distinct terms may not necessarily be seman¬ 
tically different. For instance, return a can be represented 
both by Return a and PromoteBind (return a ) Return. 
Nonetheless, using this representation the definition of m is 
much simpler. It is, in fact, directly based on the laws (01) 
(03). The complete implementation appears in Fig. 3. If 
we are only interested in defining a monad (not a monad 
transformer), then we can omit the constructor PromoteBind. 
The resulting datatype corresponds exactly to the standard 
definition of the exception monad. 

What about efficiency? The naive implementation — or 
rather, the first definition of m has a running time that is 
proportional to the size of the computation. Unfortunately, 
the ‘improved’ term implementation has a quadratic worst- 
case behaviour. Consider the expression 

m (■ ■ ■ ((7r (return 0) »= inc) :»= inc) ■ ■ ■ :»= inc). 

where inc is given by inc n = n ( return (n + 1)). Since the 
amortized running time of bind is proportional to the size 
of its first argument, it takes 0(n 2 ) steps to evaluate the 
expression above. The situation is analogous to flattening a 
binary tree. Bad luck. 


3.3 A context-passing implementation 

Since we cannot improve the implementation of the oper¬ 
ations without sacrificing the runtime efficiency, let us try 
to improve the definition of m. While rewriting m we will 
work out a specification for the final context-passing imple¬ 
mentation. For a start, we can avoid some pattern matching 
if we specialize m for op »= k. To this end we replace the 
equations concerning (3=) by the single equation 


m (op c) = mi op c 


and define wi by 

mi (Return a) c 
mi (m :»= k) c 
mi (Raise e) c 
mi (Promote m) c 


BN 

07! m (Aa 


k a :»= c) 


fail e 

m^\a^ m(ca). 


Interestingly, the parameter c is used twice in conjunction 
with m. In an attempt to eliminate the mutual recursive 
dependence on m we could try to pass m ■ c as a parameter 
instead of c. This variation of alt, which we call m, can be 
specified as follows. 


mopc = m (op :»= c) 

4»= Ma. c a = m (c a) (1) 

Let us derive the definition of m for op = Return a. We 
assume that precondition (1) holds — note that the equation 


number refers to the precondition only — and reason: 
m (Return a) c 

= { specification and assumption (1) } 

m (Return a c) 

= { definition m } 

m(c a) 

= { assumption (1) } 


The calculations for Promote m and Raise e are similar. It 
remains to infer the definition for op = (m :»= k)\ 

m (m k) c 

= { specification and assumption (1) } 

“ ((m :»= k) :»= c) 

= { definition m } 

m(m :^(A a^ka .^c)) 

= { specification } 

mm(\a^m(ka c)) 

= { specification and assumption (1) } 

mm(\a^m(k a) c). 


Voila. The dependence on m has vanished. To summarize, 
m is given by 


m (Return a) 
m (m :»= k) 
m (Raise e) 
m (Promote m) 


Ac-ica 

Ac ->• m m (\a m (k a) c ) 
Ac —> fail e 
Ac ->• m »= c. 


Note that the constructors appear only on the left-hand 
sides. This means that we are even able to remove the in¬ 
terpretative layer, ie return a can be implemented directly 
by Ac —i c a instead of Return. In general, we consistently 
replace m op by op. Silently, we have converted the term im¬ 
plementation into a context-passing implementation. To see 
why the term ‘context-passing’ is appropriate, consider the 
final specification of the context-passing implementation. 


op c = m (op »= c) 

Ma. c a = m (c a) (2) 


The parameter c of op can be seen as a representation of 
op’s calling context m (• »= c) — we represent a context 
by an expression that has a hole in it. This is the nub of 
the story: every operation knows the context in which it is 
called and it is furthermore able to access and to rearrange 
the context. This gives the implementor a much greater 
freedom of manoeuvre as compared to the simplified term 
algebra. For instance, (3=) can use the associative law to 
improve efficiency. By contrast, (»=) of the simplified term 
variety does not know of any outer binds and consequently 
falls into the efficiency trap. 

It is quite instructive to infer the operations of the context¬ 
passing implementation from scratch using the specification 
above. Fig. 4 summarizes the calculations. Interestingly, 
each monad law, the law for raise, and each law for m is 
invoked exactly once. In other words, the laws of the spec¬ 
ification are necessary and sufficient for deriving an imple¬ 
mentation. 

It remains to determine the type of the new monad trans¬ 
former. This is most easily accomplished by inspecting the 


data RAISE m a = 

Return a 


1 

1 

Mb.PromoteBind (m 
Raise Exception 

b) (6 -> RAISE m a) 

instance Monad (RAISE m) where 

return a = 

Return a 


Return a k = 

k a 


(PromoteBind m fa) »= fa = 

PromoteBind m (Ac 

i —> fa a »= fa) 

Raise e k = 

Raise e 


instance Raise ( RAISE m) where 

raise e = 

Raise e 


instance Transformer RAISE where 

promote m = 

PromoteBind m Return 

observe (Return a) = 

return a 


observe ( PromoteBind m k) = 

m 3= (observe ■ k) 


observe (Raise e) = 

fail e 


Figure 3: A simplified term implementatio 

m of RAISE. 


definition of n. Note that n m equals (»=) m and recall 
that (3=) possesses the type Va.V6.rn a —» (a —> m 5) —> 
m b which is equivalent to Ma.m a —> (V6.(a —> m b) —> 
mb). Consequently, the new transformer has type V6.(ar«i 
m b) —► m b. So, while the term implementation requires 
existential quantification, the context-passing implementa¬ 
tion makes use of universal quantification. The final imple¬ 
mentation appears in Fig. 5. 1 The cognoscenti would cer¬ 
tainly recognize that the implementation is identical with 
the definition of the continuation monad transformer [9], 
Only the types are different: RAISE involves rank-2 types 
while the continuation monad transformer is additionally 
parameterized with the answer type: CONT ans m a = 
(a -> m ans ) —>• m ans. The transformer RAISE m 
constitutes the smallest extension of m that allows one to 
add raise. Note, for instance, that calico is definable in 
CONT ans m but not in RAISE m. We will see in Sec. 4.3 
that rank-2 types have advantages over parameterized types. 

4. ADDING BACKTRACKING 

By definition, a backtracking monad is a monad with two 
additional operations: the constant false, which denotes fail¬ 
ure, and the binary operation (i), which denotes nondeter- 
ministic choice. The class definition contains a third op¬ 
eration, termed cons, which provides a handy shortcut for 
return atm. 


(mm) to = mt(nto) (B3) 

false :»= k = false (B4) 

= (m 3= fc) i (n :»= *) (B5) 

That is, false and (i) form a monoid; false is a left zero 

of (;»=), and distributes leftward through (i). Now, 

since we aim at defining a backtracking monad transformer, 
we must also specify the interaction of promoted operations 
with (i): 

= tt m^= Aa k a i n. (B6) 

Consider imasa deterministic computation, ie a compu¬ 
tation that succeeds exactly once. Then (B6) formalizes our 
intuition that a deterministic computation can be pushed 
out of a disjunction’s left branch. Finally, we must specify 
how the backtracking operations are observed in the base 
monad. 

ui false = fail "false" (04) 

u (return aim) = return a (05) 

So we can observe the first answer of a nondeterministic 
computation. 

4.1 A term implementation 

The free term algebra of the backtracking monad is given 
by the following type definition. 


class (Monad m) 
false .. 


=> Backtr m where 



The operations are required to satisfy the following laws. 

false \m = m (Bl) 

m i false = m (B2) 

' Note that RAISE must actually be defined using newtype 
instead of type. This, however, introduces an additional 
data constructor that affects the readability of the code. 
Instead we employ type declarations as if they worked as 
newtype declarations. 


data BACKTR m a 
= Return a 

| Mb. (BACKTR m b) :»= (6 BACKTR m a) 

| False 

I BACKTR m a :i BACKTR m a 
| Promote (m a) 

Let us try to derive an interpreter for this language. The 
definition of the base cases follows immediately from the 
specification. For m :»= k we obtain: 


ui (Return a :»= k) 
to ((m :^= fa) :^*= fa) 
u) (False :»= k) 
u> ((m :i n) :»= k) 
u) (Promote m :^= k) 


u>(k a) 

cj (m :»= (A a —> ki a :»= fa)) 
fail "false" 

w ((m :>^ k) 3 (n :>^ k)) 
m »= (in ■ k). 







(return a) c 

= 

{ specification and assumption (2) } 


observe (return a 3= c) 

= 

{(Ml)} 


observe (c a) 

= 

{ assumption (2) } 



= 

{ specification and assumption (2) } 


observe ((m »= k) 3= c) 

= 

{ (M3) } 


observe (m »= (Aa —ika »= c)) 

= 

{ specification } 


m (A a —> observe (k a »= c)) 

= 

{ specification and assumption (2) } 


m (Aa -> 1 o c) 

= 

(raise e ) c 

{ specification and assumption (2) } 


observe (raise e ^5= c) 

= 

{ (HI) } 


observe (raise e) 

= 

{ (03) } 


fail e 


(promote m) c 

= 

{ specification and assumption (2) } 


observe (promote m »= c) 

= 

{ (02) } 


m ^5= Aa —> observe (c a) 

= 

{ assumption (2) } 


m >=c 


observe m 

= 

{ (M2) } 


observe (m 3= return) 

= 

{ specification } 


m (Aa —t observe (return a)) 

= 

{ (oi)} 


m return 

Figure 4: 

Deriving a context-passing implementa- 

tion of RAISE. 


type RAISE m a = 

V6.(a -Mni)-»' 

m b 

instance (Monad m) 

Monad (RAISE 

m) where 

return a = 

Ac->c a 


m k = 

Ac -»• m (Aa -> k 

a c) 

instance (Monad m) 

=*> Raise (RAISE n 

?,) where 

raise e 

Ac —» fail e 


instance Transformer 

r RAISE where 


promote m = 

A c—*m 3= c 


observe m = 

m return 


Figure 5: A context-passing implementation of 

RAISE. 




Similarly, for m :i n we make a case distinction on m: 

(Return a :i f) 
u (m :»= k a f) 
u> (False a f) 
u{{m , n) a f) 
u) (Promote m a f) 

Unfortunately, one case remains. There is no obvious way 
to simplify ui ( m :^= k :i f). As usual, we help ourselves by 


laking a further case distinction c 

m m. 


j ((Return a k) :i f) = l 

o(kc 

Hif) 

o(((m :>^fa) :^fe) :if) = c 

u((m 

, (Aa -> h a 

:^fe)) :if) 

J ((False :»= k ) :i f) = c 

0 f 


j(((m-,n):^k), f) = c 

o((m 

i k) 

*«" f)) 

j ((Promote m :»= k) :i f) = 1 

m^\a^Lu(k a :if) 


Voila. We have succeeded in building an interpreter for 
backtracking. Fig. 6 lists the complete implementation. 

Now, what about correctness? Clearly, the case distinc¬ 
tion is exhaustive. To establish termination we can use the 
following polynomial interpretation. 

Return T a = 2 m :i T n = 2 x m + n 

m :^> r n = m 2 x n Promote r m = 2 

False T = 2 

As before, the laws of the specification only hold under ob¬ 
servation. 

4.2 A simplified term implementation 

Let us take a brief look at the simplified term implementa¬ 
tion. Inspecting the definition of lj — recall that a simplifier 
is likely to make the same case distinction as cj — we see that 
we need at most six terms: False, Return a, Return a :i f, 
Promote m, Promote m :;§= k, and Promote m :i f. We 
can eliminate three of them using return a = cons a false, 
7T m = 7T m^*=return, and tv m :if = tv m'3*=\a —> cons a f. 
This explains the following definition of simplified terms. 

data BACKTR m a 
= False 

| Cons a (BACKTR m a) 

| Vb. PromoteBind (m b) (b BACKTR m a) 

In essence, the simplified term algebra is an extension of the 
datatype of parametric lists with False corresponding to [] 
and Cons corresponding to (:). The additional constructor 
PromoteBind makes the difference between a monad and 
a monad transformer. Note that the standard list monad 
transformer, LIST m a = m [a], can only be applied to 
so-called commutative monads [7]. By contrast, BACKTR 
works for arbitrary monads. 

4.3 A context-passing implementation 

In Sec. 3.3 we have seen that the context-passing imple¬ 
mentation essentially removes the interpretative layer from 
the ‘naive’ term implementation. If we apply the same 
steps, we can derive very systematically a context-passing 
implementation of backtracking. We leave the details to the 
reader and sketch only the main points. First, from the case 
analysis m performs we may conclude that the most com¬ 
plex context has the form ui (• c i f). All other contexts 
can be rewritten into this form. Second, if we inspect the 








I 

I 

I 

I 

instance Transformer BACKTR where 
promote = 

observe (Return a) = 

observe (Return a :»= fa) = 

observe ((m :»= fa) :»= fa) = 

observe (False :»= k) = 

observe ((m t) n) :»= k ) = 

observe (Promote m :»= fa) = 

observe False = 

observe (Return a :i f) = 

observe ((Return a :^= k) :i f) = 

observe (((m fa) fa) :i f) = 
observe ((False :»= k) :i f) = 

observe (((m a n) :»= k) a f) = 

observe ((Promote m :^= k) a f) = 
observe (False a f) = 

observe ((m a n) a f) = 

observe (Promote m a f) = 

observe (Promote m) = 


Return a 

Vb.(BACKTR m b) :»= (b -> BACKTR m a) 
False 

BACKTRm a a BACKTRma 
Promote (m a) 

Promote 
return a 
observe (k a) 

observe (m (Aa — ¥ fa a fa)) 
fail "false" 

observe ((m :»= k) a (n :»= fa)) 

m (observe ■ k) 

fail "false" 

return a 

observe (k a a f) 

observe ((m :»= (Aa —>• fa a :»= fa)) a f) 
observe f 

observe ((m :»= fa) a ((n :»= fa) a f)) 
m Aa —> observe (k a a f) 

observe f 

observe (m a (n a f)) 


Figure 6: A term implementation of BACKTR. 


equations that are concerned with cj (• :»=c i f) we see that 
f appears once in the context co •. Likewise, c is used twice 
in the context ta {• 8 a f). These observations motivate the 
following specification. 

op ci = u} (op »= c i f) 

i = «f (3) 

A Vf' f. (Va. caf=w(cai f')) <=£ = uj f' (4) 

The nice thing about Hughes’ technique is that mistakes 
made at this point will be discovered later when the oper¬ 
ations are derived. For instance, it may seem unnecessary 
that c is parameterized with f\ However, if we simply postu¬ 
late Va. c a = ui (c a if), then we will not be able to derive a 
definition for (i). Better still, one can develop the specifica¬ 
tion above while making the calculations. The derivation of 
false, for instance, motivates assumption (3); the derivation 
of return suggests either Va. c a = ui (c a i f) or assump¬ 
tion (4) and the derivation of (i) confirms that (4) is the 
right choice. The complete derivation appears in Fig. 7. 
Interestingly, each equation of the specification is invoked 
exactly once. 

It remains to determine the type of the backtracking monad 
transformer. If we assume that the second parameter, the 
so-called failure continuation, has type m b, then the first 
parameter, the so-called success continuation, is of type 
a m b m b. It follows that the type of the new 
transformer is Va.(a —>mb—>mb)—>mb—>mb. Again, 
the answer type is universally quantified. We will see shortly 
why this is a reasonable choice. Fig. 8 summarizes the im¬ 
plementation. 

Reconsider Fig. 7 and note that the derivation of return, 
(^s=), false, and (i) is completely independent of w’s spec¬ 


ification. The laws (04) and (05) are only required in the 
derivation of u. Only it relies on (03) which, however, ap¬ 
pears to be the only sensible way to observe promoted opera¬ 
tions. This suggests that we can define different observations 
without changing the definitions of the other operations. In 
other words, we may generalize the specification as follows 
(here ip is an arbitrary observer function). 

op c f = <p (op »= c i f) (5) 

♦== f = <pi 

A Vf'f\ (Va.cat= <P (c a if')) = f' 

A Vm k. cp (-k m »= fa) = m »= (cp ■ k) 

To illustrate the use of the generalized specification assume 
that we want to collect all solutions of a nondeterministic 
computation. To this end we specify an observation solve of 
type (Monad m) => BACKTR ma^m [a]: 


solve false 

= return [] (SI) 

solve (return a i m) 

= a <1 solve m (S2) 

solve (tt m ^5= k) 

= m »= (solve ■ fa), (S3) 

where (<]) is given by 


(<]) :: (Monad m) 


a < ms = ms 3== A as 

—» return (a : as). 

An implementation for solve 

can be readily derived if we 


specialize (5) for c = return and f = false. We obtain: 
p op = op (©) e 

A Va f\ cp (return a \ i') = a® <pi' 

A Vm k. ip (x m 3= fa) = m 3*= (tp ■ k). 



( return a) c f 

= { specification and assumptions (3) & (4) } 

observe ( return a :»= c i f) 

= { (Ml) } 

observe (c a i f) 

= { assumptions (3) k, (4) } 

caf 

(m »= Jt) jgf 

= { specification and assumptions (3) & (4) } 

observe (( m »= k) »= c i f) 

= { (M3) } 

observe ( m (Aa —>• k a »= c) i f) 

= { specification and assumption (3) } 

m (A a £ —i observe (k a 3= c i f')) f 
= { specification and assumption (4) } 

m (Ao £ -*■ k a c £) f 

false c f 

= { specification and assumptions (3) & (4) } 

observe (false 3= c i f) 

= { (B4) } 

observe ( false i f) 

= {(Bl)} 

observe f 

= { assumption (3) } 

f 

(mm)cf 

= { specification and assumptions (3) & (4) } 

observe (( m i n) c i f) 

= { (B5) } 

observe (( m :»= c i n »= c) i f) 

= { (B3) } 

observe ( m »= c i ( n c i f)) 

= { specification and assumption (4) } 

m c ( observe ( n :§= c i f)) 

= { specification and assumptions (3) & (4) } 

me (net) 

(promote m) c f 

= { specification and assumptions (3) & (4) } 

observe (promote m ;»= c i f) 

= { (B6) } 

observe (promote m (Aa —> c a i f)) 

= { (03) } 

m »= Aa —1 observe (c a i f) 

= { assumptions (3) & (4) } 

m^Aa-i|flf 

observe m 

= { (M2) and (B2) } 

observe (m :»= return i false) 

= { specification } 

m (A a £ — > observe (return a i f')) (observe false) 
= { (04) } 

m (Aa£ —> observe (return a i f')) (fail "false") 
= { (05) } 

m (Aa£ —> return a) (fail "false") 


Figure 7: Deriving a context-passing implementa- 
tion of BACKTR. _ 


type BACKTR m a 

V6.(a —imb—>mb)—>mb—>mb 

instance (Monad m 

) => Monad (BACKTR m) where 

return a = 

Ac - c a 

m 3*= k = 

Ac m (Aa ->■ k a c) 

instance (Monad m 

) => Backtr (BACKTR m) where 

false = 

Ac —i id 

mm = 

\c-tmc-nc 

instance Transformer BACKTR where 

promote m = 

Acf->m»=Aa->caf 

observe m = 

m (Aa f —> return a) (fail "false") 

Figure 8: A context-passing implementation of 

BACKTR. 



Consequently, solve op = op (<) (return []). Now, instead 
of providing solve as an additional observer function we pro¬ 
mote it into the backtracking monad. 

sols :: (Monad m) => BACKTR ma -S- BACKTR m [»| 

sols m = tt (m (<) (return [])) 

This way we can use the all solution collecting function as if 
it were a new computational primitive. Since tt is a monad 
morphism, we furthermore know that sols satisfies suitable 
variants of (S1)^(S3). Note that the implementation of sols 
makes non-trivial use of rank-2 types. If we used a variant 
of BACKTR that is parameterized with the answer type, 
then sols cannot be assigned a type t a —> t [a] for some t. 


5. ADDING CONTROL 

Let us extend our language by two additional Prolog-like 
control constructs. The first, called cut and denoted “ . 
allows us to reduce the search space by dynamically pruning 
unwanted computation paths. The second, termed call, is 
provided for controlling the effect of cut. Both constructs 
are introduced as a subclass of Backtr. 


class (Backtr m) Cut m where 
! :: m() 

cutfalse :: m a 


cutfalse 


return () i cutfalse 
! » false 


The operational reading of 1 ! ’ and call is as follows. The 
cut succeeds exactly once and returns (). As a side-effect 
it discards all previous alternatives. The operation call de¬ 
limits the effect of cut: call m executes m; if the cut is 
invoked in m, it discards only the choices made since m was 
called. The class definition contains a third operation, called 
cutfalse, which captures a common programming idiom in 
Prolog, the so-called cut-fail combination [14]. 

Note that instances of the class Cut must define either 
‘! ’ or cutfalse. The default definitions already employ our 
knowledge about the properties of the operations, which 
we shall consider next. We sketch the axiomatization only 
briefly, for a more in-depth treatment the interested reader 
is referred to [5]. The cut is characterized by the following 






three equations. 


(!>"»),» = • » rn (! 1 ) 

! » (m i n) = ra i I » n (12) 

! :» return () = J (!3) 

The first equation formalizes our intuition that a cut dis¬ 

cards past choice points, ie alternatives which appear ‘above’ 
or to its left. On the other hand, the cut does not affect fu¬ 
ture choice points, ie alternatives which appear to its right. 
This fact is captured by (!2). Axiom (!3) simply records that 
cut returns (). An immediate consequence of the axioms is 
! = return () i ! false , which explains the default defi¬ 
nition of cut. To see why this relation holds replace m by 
return () and n by false in (!2). 

The operation cutfalse enjoys algebraic properties which 
are somewhat easier to remember: cutfalse is a left zero of 
both (»=) and (i). 

cutfalse k = cutfalse (CPI) 

cutfalse i m = cutfalse (CP2) 

The default definitions use the fact that ‘! ’ and cutfalse are 
interdefinable. Likewise, the two sets of axioms are inter¬ 
changeable. We may either define cutfalse = ! 2> false and 
take the equations for ‘! ’ as axioms — the laws for cutfalse 
are then simple logical consequences — or vice versa. 

Finally, call is required to satisfy: 


call false 

= false 

(Cl) 

call (return a i m) 

= return a i call m 

(C2) 

call (! > m) 

= call m 

(C3) 

call (m i cutfalse) 

= call m 

(C4) 

call (it m k) 

= % m 3= (call ■ k). 

(C5) 


Thus, call m behaves essentially like m except that any cut 
inside m has only local effect. It remains to lay down how 
the new operations are observed in the underlying monad. 

ui (call m) = uim (06) 

Note that we need not specify the observation of ‘! ’ and 
cutfalse since (C3), (C4), and (06) imply u (! » m) = lj m 
and lj (m i cutfalse) = lj m. 

5.1 A term implementation 

The free term implementation faces two problems, one 
technical and one fundamental. Let us consider the technical 
problem first. Inspecting the type signature of cut, we find 
that cut cannot be turned into a constructor, because it does 
not have the right type. If we define a type, say, CUT m a, 
then ‘! ’ must have exactly this type. Alas, its type signature 
only allows for a substitution instance, ie CUT m (). Here, 
we stumble over the general problem that Haskell’s data 
construct is not capable of expressing arbitrary polymorphic 
term algebras. Fortunately, the axioms save the day. Since 
‘! ’ can be expressed in terms of cutfalse and this operation 
has a polymorphic type, we turn cutfalse into a constructor. 

data CUT m a = Return a 

| Mb. [CUT mb) :»=(6 ->• CUT m a) 
j False 
j CutFalse 

| CUT m a :i CUT m a 
| Call (CUT m a) 

| Promote (m a) 


Turning to the definition of lj we encounter a problem of a 
more fundamental nature. For a start, we discover that the 
term u ; ( call m ;§= k) cannot be simplified. If we make a fur¬ 
ther case distinction on m , we end up with lj (call (call m^= 
h) :»= fe) which is not reducible either. The crux is that 
we have no axiom that specifies the interaction of call with 
(^s=). And rightly so. Each call opens a new scope for cut. 
Hence, we cannot reasonably expect that nested calls can be 
collapsed. This suggests to define two interpreters, one for 
lj and one for call, which means, of course, that the imple¬ 
mentation is no longer based on the free term algebra. The 
resulting code, which is mostly straightforward, appears in 
Fig. 9. The equations involving cutfalse use the fact that 
cutfalse is a left zero of both (»=) and (i), and that call 
maps cutfalse to false. Note that lj falls back on call to 
avoid duplication of code. 

5.2 A simplified term implementation 

For the sake of completeness, here is the simplified term 
algebra, which augments the type BACKTR of Sec. 4.2 with 
an additional constructor for cutfalse. 
data CUT m a 
= False 
| CutFalse 
j Cons a (CUT m a) 
j Mb. PromoteBind (m b) (b -4 CUT m a) 

In essence, we have lists with two different terminators, 
False and CutFalse. Interestingly, exactly this structure 
(without PromoteBind) has been used to give a denotational 
semantics for Prolog with cut [1], where cutfalse and call are 
termed esc and unesc. 

5.3 A context-passing implementation 

We have seen that the realization of cut and call is more 
demanding since there is no way to simplify nested invoca¬ 
tions of call. With regard to the context-passing implemen¬ 
tation this means that we must consider an infinite number 
of possible contexts. Using a grammar-like notation we can 
characterize the set of all possible contexts as follows. 

C ::= « (• ■»= k l f) '\C[wll (• 3- k i f|J 

A context is either simple or of the form C[call (• k i f)] 
where C is the enclosing context. Thus, contexts are or¬ 
ganized in a list- or stack-like fashion. As usual we will 
represent operations as functions from contexts to observa¬ 
tions. The main difference to Sec. 4.3 is that each operation 
must now consider two different contexts and that the con¬ 
texts are recursively defined. Note, however, the duality 
between the term and the context-passing implementation: 
In Sec. 5.1 we had two interpreters, call and lj, and each in¬ 
terpreter had to consider each operation. Here we have two 
contexts and each operation must consider each context. 

Turning to the implementation details we will see that 
the greatest difficulty is to get the types right. The contexts 
are represented by a recursive datatype with two construc¬ 
tors: OBCC (which is an acronym for observe-bind-choice 
context) and CBCC (call-bind-choice context). The first 
takes two arguments, the success and the failure continu¬ 
ation, while the second expects three arguments, the two 
continuations and the representation of the enclosing con¬ 
text. In order to infer their types it is useful to consider the 



data CUT m a 

= 

Return a 

instance Cut (CUT m) where 
cutfalse 

1 

1 

1 

1 

1 

Mb.(CUT m b) :»= (b CUT m a) 

False 

CutFalse 

CUT m a :i CUT m a 

Promote (m a) 

CutFalse 

call (Return a) 

= 

Return a 

call (Return a :»= k) 

= 

call (k a) 

call ((m :»= hi) :»= fe) 

= 

call (m (Aa —► fcj a :»= fe)) 

call (False :»= k) 

= 

False 

call (CutFalse k) 

= 

False 

call ((m a n) :»= it) 

= 

call ((m :»= k) :i (n :2s= k)) 

ca/i (Promote m k) 

= 

Promote m :^= (call ■ k) 

call False 

= 

False 

call CutFalse 

= 

False 

call (Return a :i f) 

= 

Return a a call f 

call ((Return a :»= k) :i f) 

= 

call (k a a f) 

call (((m :>=fci) :»=fc) :i f) 

= 

call ((m :»= (Aa fa a :»= fc)) a f) 

call ({False :»= k ) :i f) 

= 

call f 

call ((CutFalse :»= it) :: f) 

= 

False 

n) :»= A) f) 

= 

call ((m :»= k) a ((n :»= k) a f)) 

call (( Promote m :»= A) :i f) 

= 

Promote m :»= Aa — > call ( k a a f) 

call ( False :i f) 

= 

call f 

call ( CutFalse :i f) 

= 

False 

call ((m n) :i f) 

= 

call (m a (n a f)) 

call ( Promote m :i f) 

= 

Promote m a call f 

call ( Promote m) 

= 

Promote m 

instance Transformer CUT where 


promote 

= 

Promote 

observe m 

= 

observe' ( call m) 

observe' 


( Monad m) =* CUT ma^ma 

observe' ( Return a) 

= 

return a 

observe' ( Promote m k) 

= 

m »= ( observe' ■ k ) 

observe' False 

= 

fail "false" 

observe' ( Return a :\ f) 

= 

return a 

observe' ( Promote m :i f) 

= 

m 

observe' ( Promote m) 

= 

m 

Figure 9: A term 


iplementation of CUT. 




specification of the context-passing implementation before¬ 
hand. The specification is similar to the one given in Sec. 4.3 
except that we have two clauses, one for each context. 

op (OBCC cQ =uj (op »= c i f) 

4= £ = « f (4) 

A Vf' £. (Va. ca?=co(cai f')) •«= £ = w f' (5) 
op ■ CBCC c f = call (op c i f) 

*= call f (6) 

A Vf' £. (Va. c a £ = call (c a i f')) <!= £ = call f' (7) 

The first clause closely corresponds to the specification of 
Sec. 4.3. For that reason we may assign the components of 
OBCC c f the same types: f has type m b and c has type 
a m b m b where b is the answer type. This implies 
that the type of contexts must be parameterized with m, a, 

data Cm ab = OBCC (am bm b) (m b) \ ... 

The second clause of the specification has essentially the 
same structure as the first one. The main difference is that 
the components dwell in the transformed monad rather than 
in the underlying monad. Furthermore, CBCC additionally 
contains the enclosing context which may have a different 
type. To illustrate, consider the context C[call (• 3= c i f)] 
of type C m a b. If we assume that the enclosing context C 
has type C m i b — there is no reason to require that C has 
the same argument type as the entire context, but it must 
have the same answer type — then f has type CUT m i and 
c has type a —> CUT m i —> CUT m i. This motivates the 
following definition. 

data Cmab = OBCC (a ->■ m b m b) (m b) 

| Mi.CBCC (a -4 CUT m i CUT m i) 
(CUT mi)(Cmib) 
type CUT m a = Mb.C mab^mb 

Note that the intermediate type is represented by an exis¬ 
tentially quantified variable. The mutually recursive types 
C and CUT are somewhat mind-boggling as they involve 
both universal and existential quantification, a combination 
of features the author has not seen before. 

Now that we have the types right, we can address the 
derivation of the various operations. Except for tt the cal¬ 
culations are analogous to those of Sec. 4.3. For n m we 
must conduct an inductive proof to show that m propagates 
through the stack of contexts, ie (-k m^*=k) c = m^*=\a —> 
k a c. The proof is left as an exercise to the reader. To de¬ 
rive cut we reason: 

!■ CBCC cf 

= { specification and assumptions (6) & (7) } 

call (! 3- c i f) 

= { (!3), (M3), and (Ml) } 

call (! > e () I f) 

= { (!1) and (!2) } 

call (c() i ! false) 

= { assumption (7) } 

c() (call( \ > false}) 

= { (C3) and (Cl) } 

s 0 


The derivation for the context OBCC proceeds in an anal¬ 
ogous fashion. For call we obtain: 

= { (M2) and (B2) } 

call (m return i false) 

= { specification } 

m ■ CBCC (A a £ *4 call (return a i f')) (call false) 

= { (Cl) and (C2) } 

m ■ CBCC (A a £ -5- return a i call f') false 
= { £ = call £ } 

m ■ CBCC (A a £ -5- return a i £) false 
= { definition cons } 

m ■ CBCC cons false. 

Thus, call installs a new context with cons and fail as the 
initial failure continuations. The complete implementation 
appears in Fig. 10. Note that most of the monad operations 
pattern match on the context. This fact sets the implementa¬ 
tion apart from continuation passing style (CPS), where the 
context is an anonymous function that cannot be inspected. 
By contrast, CPS-based implementations [3, 10] use three 
continuations (a success, a failure, and a cut continuation). 

6. CONCLUSION 

Naturally, most of the credit goes to J. Hughes for in¬ 
troducing two wonderful techniques for deriving programs 
from their specification. Many of the calculations given in 
this paper already appear in [6], albeit specialized to mon¬ 
ads. However, the step from monads to monad transformers 
is not a big one and this is one of the pleasant findings. To 
be able to derive an implementation of Prolog’s control core 
from a given axiomatization is quite remarkable. We have 
furthermore applied the techniques to derive state monad 
transformers, STATE , and exception monad transformers, 
EXC. In both cases the techniques worked well. 

Some work remains to be done though. We did not ad¬ 
dress the problem of promotion in general. It is well known 
that different combinations of transformers generally lead to 
different semantics of the operations involved. For instance, 
composing STATE with BACKTR yields a backtracking 
monad with a backtrackable state, which is characterized 
as follows. 

store s ~S> false = false 
store s :» (m i n) = store s m i store s n 

Reversing the order of the two transformers results in a 
global state, which enjoys a different axiomatization. 

store s > (m i b) = store s^> m \ n 

For both variants it is straightforward to derive an imple¬ 
mentation from the corresponding specification — in the 
first case (i) is promoted through STATE , in the second 
case store is promoted through BACKTR. Unfortunately, 
some harder cases remain, where the author has not been 
able to derive a promotion in a satisfying way. The problem¬ 
atic operations are, in general, those where the interaction 
with (^=) is not explicitly specified. For instance, it is not 
clear how to derive the promotion of call through the state 
monad transformer. 


data Ctx m a b = 

1 

OBCC (a —¥ m b m b) (m b) 

Vi.CBCC (a CUT m i -1 CUT m i) (CUT m i ) (Ctx m i b) 

type CUT ma = 

Vb.Ctx m a b —1 m b 

instance ( Monad m 

) =>■ Monad (CUT m) where 

return a = 

Xctxo —> case ctx o of OBCC cf —> c a f 

CBCC cf ctx ^ c a f ctx 

m^k 

Xctxo -1 case ctx 0 of OBCC c f -> m (OBCC (Xa f k a (OBCC c f')) f) 

CBCC cf ctx -4- m (CBCC (Xa f->ka- CBCC cf) f ctx) 

instance (Monad m 

) =>■ Backtr (CUT m) where 

false 

Xctxo —> case ctx o of OBCC c f —> f 

CBCC cf ctx —> f ctx 

m\n 

Xctxo case ctx 0 of OBCC cf m (OBCC c (n (OBCC cf))) 

CBCC c f ctx —1 m (CBCC c (n ■ CBCC c f) ctx) 

instance (Monad m 

) =*> Cut (CUT m) where 

l 

Xctxo ~> case ctx o of OBCC cf—>c() (fail "false") 

CBCC cfctx—tc() false ctx 

call m = 

Xctxo — 1 m (CBCC cons false ctx o) 

instance Transformer CUT where 

promote m = 

Xctxo —> case ctx o of OBCC c f —> m »= Xa — > c a f 

CBCC cf ctx —1 m »= Xa —1 c a f ctx 

observe m = 

m (OBCC (Xa f —> return a) (fail "false")) 

Figure 10: A context-passing implementation of CUT. 
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